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Abstract 

Background: Protein-DNA interactions play a crucial role in the life of biological organisms in controlling 
transcription, regulation, as well as DNA reconnbination and repair. The deep understanding of these processes, 
which requires the atomic description of the interactions occurring between the proteins and their DNA partners is 
often limited by the absence of a 3D structure of such complexes. 

Results: In this study, using a method combining sequence homology, structural analogy modeling and 
biochemical data, we first build the 3D structure of the complex between the poorly-characterized PerR-like 
regulator Slrl738 and its target DNA, which controls the defences against metal and oxidative stresses in 
Synechocystis. In a second step, we propose an expanded version of the Slrl738-DNA structure, which 
accommodates the DNA binding of Slrl738 multimers, a feature likely operating in the complex Slrl738-mediated 
regulation of stress responses. Finally, in agreement with experimental data we present a 3D-structure of the 
Slrl738-DNA complex resulting from the binding of multimers of the FUR-like regulator onto its target DNA that 
possesses internal repeats. 

Conclusion: Using a combination of different types of data, we build and validate a relevant model of the 
tridimensional structure of a biologically important protein-DNA complex. Then, based on published observations, 
we propose more elaborated multimeric models that may be biologically important to understand molecular 
mechanisms. 



Background 

DNA-binding proteins play a crucial role in many fun- 
damental biological processes including transcription, 
regulation, as well as DNA replication and repair. Thus, 
a better understanding of DNA-protein interactions has 
both a fundamental research interest and an applied 
importance in medicine (development of drugs interfer- 
ing with oncogene expression) and biotechnology 
(genetic engineering of microbial organisms). 

In the past, a lot of effort has been made to under- 
stand the basic principles that govern the specificity of 
protein-DNA interactions. It appeared that there is no 
simple recognition code linking the DNA interacting 
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amino acids of a protein with their target DNA nucleo- 
tides [1]. Furthermore, there are currently no standard 
methods to build a 3D-structure model for the represen- 
tation of a DNA-protein complex, unlike what occurs 
for protein-protein interactions [2]. All current methods 
for predicting the structures of protein-DNA complexes 
use the features of the unbound protein and DNA part- 
ners and various algorithms (shape complementarity, 
surfaces properties, experimental contacts...) to drive the 
docking, and propose a model for the studied protein- 
DNA complexes. By contrast, in this study, we used the 
experimentally-determined structures of protein-DNA 
complexes that are presumably similar to the one we 
study, to build a model representation of its possible 
structure. For this purpose, we selected among the 
DNA-protein complexes available at the PDB database, 
those sharing secondary structure motif analogy with 
our protein of interest, irrespective of the sequence 
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homology between these reference proteins and our stu- 
died protein. This strategy, aims to preserve the struc- 
tural conformations required to establish the 
interactions between amino acids and nucleotides in the 
model complexes. In order to test the feasibility of this 
strategy, we applied it to manually build a reliable 
model of the complex occurring between Slrl738, an 
important but structurally uncharacterized member of 
the family of PerR transcription regulators, and its target 
DNA. The PerR family of regulators belongs to the lar- 
ger family of bacterial FUR regulators (ferric uptake reg- 
ulator), which control the responses to iron or zinc 
availabilities [3], for a review see [4]. PerR was initially 
characterized as the master regulator of the Bacillus 
subtilis responses to hydrogen peroxide [5], which regu- 
lates the anti-oxidant genes encoding the DNA binding 
protein MgrA, the catalase KatA, the alkyl hydroperox- 
ide reductase AhpCF, PerR itself, and FUR, in accor- 
dance with the interplay between iron homeostasis and 
protection against oxidative stress. PerR is a small 
dimeric protein that contains two metal ions per mono- 
mer and binds to AT-rich DNA motifs of the promoter 
region of its target genes. One metal-binding site coor- 
dinates a zinc ion that plays a structural role, while the 
second site binds the regulatory metal, Fe^"^ (PerR-Zn- 
Fe) or Mn^"" (PerR-Zn-Mn). PerR senses H2O2 through 
the Fe-catalyzed oxidation of its H37 or H91 amino acid 
residues, leading to dissociation of the PerR-DNA com- 
plex [6]. The recent crystallographic structures of the 
PerR protein, though not complexed to its target DNA, 
suggested possible mechanisms by which PerR under- 
goes similar conformational changes upon binding either 
Fe or Mn. The structure of the PerR-Zn protein lacking 
Fe [7] reveals the two CXXC motifs involved in the tet- 
rathiolate coordination of Zn^"^ that stabilizes the PerR 
dimer in a flat conformation poorly suited to bind DNA. 
The structures of the fully metalated proteins PerR-Zn- 
Mn and PerR-Zn-Fe indicate that the binding of either 
Mn^"^or Fe^^ ions, likely to the same pair of regulatory 
sites on the dimer, lead to a caliper-like close conforma- 
tion better suited to bind DNA [5,8]. 

PerR-like regulators occur in a wide variety of prokar- 
yotic organisms, including cyanobacteria, which are 
important for the Biosphere in producing a large part of 
the atmospheric oxygen and the biomass for the food 
chain [9], and have promising biotechnological poten- 
tials [10-12]. By their nature, cyanobacteria are fre- 
quently challenged by the intrinsically related oxidative 
and iron stresses, as they perform the two main iron- 
requiring oxidant-generating processes respiration and 
photosynthesis [13]. In this study, we pursued the analy- 
sis of the PerR-like regulator Slrl738, which controls 
the responses to oxidative and metal stresses in the 
widely-used model cyanobacterium Synechocystis 



PCC6803 [14-16]. We identified the transcription start 
site and the crucial -10 promoter element for the two 
oppositely oriented genes sir 1738 and sill 621 {sill 621 
encodes the antioxidant enzyme AhpC (alkylhydroper- 
oxidase [13])). We also characterized the long (33 bp) 
AT rich motif involved in the Slrl738-mediated repres- 
sion of sill 621, Also interestingly, we built a 3D struc- 
tural model of Slrl738 complexed with its AT-rich 
target DNA. This model will be of great help to deci- 
pher the molecular mechanisms operating in the tight 
interplay between iron homeostasis and tolerance to 
metal and oxidative stresses. Furthermore, starting from 
our protein dimer model, we also propose more specu- 
lative complex, Le, oligomeric, structures (tetramer, hex- 
amer, etc..) as possible molecular effectors of the 
numerous regulations controlled by Slrl738 [17]. 

Methods 

Experimental work 

Determination of the transcription start sites of the 
divergently transcribed sir 1738 and sill 621 genes 

Total RNAs were isolated from Synechocystis and trea- 
ted with shrimp alkaline phosphatase (SAP) that does 
not affect full-length mRNA, which have 5'-triphosphate 
ends, but dephosphorylates degraded RNA, which have 
a 5'-monophosphate extremity. Then RNAs were treated 
with tobacco acid pyrophosphatase (TAP), which con- 
verts the 5'-triphosphate of full length mRNA into 5'- 
monophosphate, but does not modify the 5'-OH of the 
degraded RNA. Then the 5'-monophosphate extremity 
of the full length mRNA was ligated to an RNA anchor 
with the T4 RNA ligase, and the resulting chimeric 
RNA was reversed transcribed with a gene-specific pri- 
mer, thus creating the first strand of cDNA. This cDNA 
strand was amplified by PGR using both the DNA ver- 
sion of the RNA anchor at the 5' extremity, and the 
gene specific primer at the 3' side. Finally, the PCR- 
amplified DNA was sequenced to determine the nature 
of the nucleotide immediately downstream of the DNA 
anchor oligonucleotide that corresponds to the tran- 
scription start site (TSS). 

Construction of transcriptional fusions to the cat reporter 
gene and CAT assay 

The sir 1738 promoter region and segments thereof were 
amplified by PGR, using site-specific oligonucleotides 
that flanked the PGR DNA product with SnaBl blunt- 
ended restriction sites in such a way that all nucleotide 
substitutions were eliminated upon cleavage with SnaBl. 
The resulting blunt-ended promoter fragments were 
cloned in the unique SnaBl site of the pSB2A promoter 
probe vector [18], i.e, in front of its promoter-less cat 
reporter gene. The sequence of every promoter insert 
was verified (Big Dye kit; ABI Perkin-Elmer) before and 
after replication in Synechocystis, Then, 1-2 x 10^ 



Garcin et al. BMC Structural Biology 2012, 12:1 
http://www.biomedcentral.eom/1 472-6807/1 2/1 



Page 3 of 1 5 



reporter cells grown on standard plates up to mid-log 
phase culture were rapidly harvested and disrupted with 
an Eaton press, prior to CAT assay [19]. CAT activities 
are the mean value of three measurements performed 
on two independent cellular extracts; 1 CAT unit = 1 
nmol of chloramphenicol acetylated. min'^. mg'^ of 
protein. 

Modeling work 

Sir 1738 monomer construction 

The homology model for the monomer of the Synecho- 
cystis PerR-like regulator Slrl738 was obtained after 
sequence alignment of the structure of the closely 
related FUR protein of Pseudomonas aeruginosa (PDB 
ID IMZB) [20] and built using the modeller program. 
Completion of the starting structure of the PerR-like 
model was achieved with the xLeap module of Amber 9 
suite, to finally get the correct 139 residues protein 
sequence (Figure lA). The final structural model of the 
PerR-like protein was obtained by short MD relaxation. 



Metallic sites parametrization 

Considering the nature of the amino acid ligands of the 
two potential metal-binding sites (using analogy to the 
family of FUR regulators and its PerR-type sub-family), 
the Slrl738 protein should likely be a PerR type con- 
taining one iron atom and one zinc atom per monomer. 
Zinc site forcefield parameters have been already pro- 
posed for Amber [21,22], so we used a combination of 
these data and of the tetrahedral geometry of a four- 
cysteine zinc site in a metallothionein (PDB ID IJJD) 
[23] as a structural reference to set the zinc site para- 
meters. The four C95, C98, C134 and C137 cysteines of 
Slrl738 likely involved in metal coordination (determi- 
nation by CxxC pattern analogy to PerR regulator) are 
covalently bonded {i.e. explicit bonded terms in the for- 
cefield) to the zinc atom via their sulfur atom. 

The iron site is not sufficiently defined for a rigorous 
semi-empirical treatment at simple atomic resolution. 
Iron should be in its ferrous state [24] and penta- or 
hexa-coordinated to histidine (via deprotonated pyrrole 
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Figure 1 Protein Slr1738 features. (A) Slrl738 primary sequence (N-terminal domain in blue and C-terminal domain in green) and predicted 
secondary structure elements (a-helix in purple and (3-strands in yellow). Metallic ligands are highlighted in red for iron and in green for zinc 
site. B) 3D monomeric and C) dimeric model of Slrl738 in ribbon representation (iron and zinc ions are respectively represented in red and 
orange van der Waals spheres). 
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nitrogen) and aspartic acid residues [8]. We chose to 
model the metal ion as a hard sphere with a +2.0 raw 
charge and a 1.2 A van der Waals radius. The coordina- 
tion site was set via distances harmonic constraints (100 
KcaLmol'^.A'^) between iron and its amino acid ligands. 
Despite inducing potential local distortions, the overesti- 
mation of the charge should prevent dissociation of the 
metal ion from its binding site and preferentially stabi- 
lize the folded conformation. In our initial model, iron 
presents a tetrahedral geometry (H32, E80, H89 and 
El 00 in the Pseudomonas aeruginosa FUR regulator are 
respectively H36, D84, H92 and D103 in the Synechocys- 
tis Slrl738 regulator) resulting from the experimental 
Fe/Zn-substituted structure. Bipyramidal base-square 
geometry is reached with histidine H90 residue coordi- 
nation and the addition of a water molecule that could 
be crucial for the function of the protein [6]. 
Molecular mechanics and molecular dynamics (MD) 
simulations 

All simulations have been performed using Parm99 force- 
field and programs of Amber 9 suite [25]. Molecules were 
neutralized with Na^ ions and placed in TIP3P water 
boxes for explicit solvation. After energy minimisations, 
models were used to initiate MD simulations. Short MD 
simulations (5ns for the production period) were princi- 
pally used to allow accelerated geometry optimisation after 
structure modifications. This provided also qualitative 
information about stability of the models and the flexible 
regions of a structure. Final models were obtained after 
geometric average on the whole stabilized trajectories fol- 
lowed by energy minimization in a solvent box. 
Energy association and contact surface calculations 
Binding free energies between DNA and proteins mole- 
cules were estimated using the MM-PBSA method [26]. 
This method was used with success for several biological 
protein-ligand complexes [27-29] though the final values 
need to be interpreted with caution due to approxima- 
tions in entropic contributions. To avoid such problems, 
we compared only the relative, not absolute, values 
between the different complexes. Each complex used 
was rigorously comparable in terms of number of 
atoms. Contact surface interactions were computed with 
the MSMS program [30]. It gives access to the solvent 
accessible surface area (SASA), which was used to calcu- 
late the contact surface values between a receptor and 
its ligand by using the formula reported bellow. 

SASAfgc + SASAiig — SASA^pix 



Three dimensional construction of the [(Slr1738-Zn-Fe)2- 
DNA] complex 

Choice of the template structure We built a Slrl738- 
DNA complex by structural analogy with the 239 PDB 



structures of DNA-protein complexes involving tran- 
scription factors. Among those, we considered only the 
DNA binding proteins harbouring a presumptive helix- 
turn-helix (HTH) motif predicted by the Pfam database. 
HTH motifs are known to vary widely in sequence over 
the whole DNA-binding domain and their relationships 
can often only be based on structure similarity [31]. 
Finally, we retained 4 structures [32-35] because they 
complied with the following qualitative criteria (Addi- 
tional file 1 Table SI) found in FUR proteins [20]: 1) 
the size of the double stranded DNA target site is > 20 
bp; 2) to be complexed with DNA the protein must be 
homodimeric; 3) the HTH motif {ca. 25 residues in 
length) must be followed by two anti-parallel p -strands. 
For information, the sequence alignment of the HTHw 
motifs is given in Additional file 2 Figure SI; 4) the 
resolution of the experimental X-ray structure must be 
good. Note that as observed for the FUR dimer of 
Escherichia coli [36], the two recognition helices H4 are 
almost perpendicular and thus should be positioned on 
both sides of the DNA global axis, and not on the same 
side as occurs with parallel helices (Additional file 3 Fig- 
ure S2). 

Fitting procedure Using successive energy minimisation 
steps, the Slrl738 monomer was fitted onto each of the 
4 reference complexes by positioning the H4 recognition 
helix in the major groove of the target DNA. This global 
protein-DNA association driven by distance restraints 
was performed with frozen Slrl738 secondary structure 
and ended after reproducing the geometry observed in 
the selected structural patterns. We tested four kinds of 
superposition motifs to select the best one enabling the 
largest surface of protein/DNA contacts as shown in 
Additional file 4 Table S2. We note that the contact sur- 
face differences between the template structures reflect 
differences in the DNA sequences. Then, we replaced 
the template nucleotides in the model by the Slrl738 
DNA-binding sequence while maintaining the phospho- 
diester backbone of DNA. After geometry optimization, 
we selected only three structures on the quality of their 
surface contacts and association energies (Table 1). The 
final three models were derived from different PDB 
structures (ISAX, 1U8R and 1Z9C). 
Closure of the structure The final step of our protocol 
was to close the protein-DNA complex by positioning 
the second monomer of Slrl738 onto its DNA target. 
This was done using a two-step method. First, we 
matched the dimer Sir 173 8 structure to the protein/DNA 
complex where both Slrl738 monomers were properly 
positioned. In this way, we conserved the integrity of the 
Slrl738 protein structure and the recognition helices 
remained close to their optimal positions. Second, we 
applied distance restraints derived from structural 
HTHw patterns on both monomers in order to insert the 
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Table 1 DNA-protein complexes parameters of the structures built with the target DNA sequence. 

Superposition type 



H4S1S2 H3H4S1S2 



PDB name 


Contact surface (A^) 


Association energy with DNA (21 bp) 


Contact surface (A^) 


Association energy with DNA (21 bp) 


ICOW 


900 


35.4 


888 


19.03 


1SAX 


890 


23.92 


946 


13.18 


1U8R 


949 


25.88 


885 


7.18 


1Z9C 


1006 


-6.82 


939 


11.14 



Association energies and contact surfaces of 3D models with different secondary structure motifs superpositions for the construction of the DNA-(Slr1 738)2 
complex. Best values are in bold. 



Slrl738 recognition helices into the major groove of 
DNA. The integrity of the sugar-phosphate DNA back- 
bone, the dimer interface and protein secondary struc- 
tures was maintained under harmonic restraints 
throughout the procedure. The internal energies of the 
complexes were finally minimized with decreasingly har- 
monic restraints to ensure a smooth transition of the 
atomic system toward a relaxed configuration. The pro- 
cedure was completed with short MD simulations in sol- 
vated and neutralized conditions. Contact surfaces and 
the association energies of the three best-predicted mod- 
els are presented in Table 2. Structures at different steps 
of our construction strategy are presented in Figure 2. 

Results and Discussion 

In vivo analysis of the slr1738 and sll1621 divergent 
promoters: evidence for repression by Slr1738 and role of 
its AT-rich DNA binding motif 

We pursued the analysis of the PerR-like regulator 
Slrl738 that controls the responses to oxidative and 
metal stresses in the model cyanobacterium Synechocys- 
tis PCC6803 [14-16]. It was reported [37] that the dele- 
tion of sir 1738 increases the expression of the two 
oppositely-oriented genes http://genome.kazusa.or.jp/ 
cyanobase5/r7 738 itself and sill 621, which encodes the 
anti-oxidant peroxiredoxin enzyme AhpC. It has also 
been shown that the Slrl738 protein binds the 300 bp 
slll621-slrl738 intergenic region [14-16], which pos- 
sesses a long (30 bp) DNA motif containing only A and 
T nucleotides (the AT-only motif). Together, these 



Table 2 DNA-protein complexes parameters of the final 
structures. 



Structure 
name 


Contact surface 
(A") 


Association energy with DNA 
(kcal/mol) 


ISAX 


1852 


36 


(H3H4S1S2) 






1U8R 


1083 


51.23 


(H3H4S1S2) 






1Z9C (H4S1S2) 


1230 


50.45 



Association energies and contact surfaces of 3D models obtained for the final 
structures of DNA-(Slr1 738)2 complex. Best values are in bold. 



findings suggested that Slrl738 represses both the 
sir 1738 and sill 621 promoters. To test this interpreta- 
tion, and the possible role of the long AT-only DNA 
motif occurring in the slll621-slrl738 promoter region, 
we performed the following experiments. We cloned the 
300 bp sll 162 1-slr 1738 promoter region, in both orienta- 
tions relative to the promoter-less cat reporter gene of 
our promoter-probe plasmid vector pSB2A which repli- 
cates in Synechocystis at about 10 copies per cell, i.e. at 
one copy per copy of the polyploid chromosome [38]. 
This generated the reporter plasmids pslll621-cat and 
pslrl738-cat, which replicated stably in Synechocystis, as 
expected (data not shown), where they directed similar 
level of cat expression (Figure 3). As the usual control, 
we verified that the empty pSB2A plasmid carrying no 
promoter insert produced no CAT activity. Collectively, 
these data showed that the sll 1621 and sir 1738 promo- 
ters have similar strengths, which resemble those of 
other Synechocystis genes we previously studied with 
pSB2A [39] and references therein. 

We and others [17] tried to find consensus sequences 
in the promoter regions of the wealth of genes presum- 
ably regulated by Slrl738. However we found none 
(Additional file 5 Figure S3), as occurred in the case of 
other PerR-like regulators [4,5,40], suggesting that parti- 
cular DNA structures rather than defined nucleotide 
motifs in the target DNA govern PerR regulation of its 
target genes. This context encouraged us to perform a 
mutational analysis of the sir 1738 and sll 1621 promoters 
to identify their c/5-acting promoter elements. In the 
sir 1738 promoter, we studied the 5'-TAagcT-3' hexamer 
that matches the canonical -10 box of a70-type Escheri- 
chia coli promoters [41] in both sequence (5'-TATAAT- 
3') and position (-12 to -7) from the transcription start 
site, which we presently mapped with the classical 5' 
RACE technique [42] that works well in Synechocystis 
[43]. It is the A nucleotide we noted as +1, which is 
located 131 bp upstream the slrl838 start codon (see 
Additional file 6 Figure S4). Consistent with its identifica- 
tion as the -10 promoter box, we found the 5'-TAagcT-3' 
element to be crucial to slrl 738 transcription. Indeed, the 
transversion mutagenesis of its proximal T nucleotide 
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Figure 2 Different steps of the 3D model construction of DNA-(Slr1 738)2 complex. A - Superposition of [protein-DNA] X-ray structure witli 
(SIrl 738)2 based on recognition lielix H4 of monomer A. B - Structure of (SIrl 738)2 clocl<ed to X-ray DNA (monomer A is well-positioned). C - 
DNA sequence modification. D - Structure of the SIrl 738 monomer A (orange) well docked to the DNA molecule and ideal positioning of the 
monomer B (yellow). E - Matching on DNA sequence of both structures in D which give rise to a bad dimerization interface for SIrl 738. F - 
(SIrl 738)2 with a well defined dimerization interface, in green, matched on the protein structure E, in light orange. G - Minimization protocol 
consisting in application of harmonic distance restraints on both the DNA recognition helices. H - Final structure. 



(5'-TAagcT-3' to 5'-GAagcT-3') completely abolished 
sir 1738 promoter activity (Figure 3), as occurred with the 
-10 promoter boxes of other Synechocystis genes 
[19,39,44-46]. By contrast, no sequence resembling a -35 
promoter box (5'-TTGACA-3') was found upstream of 
the sir 1738 -10 box, either at 17 bp {i,e, the canonical 
Escherichia coli spacing [47]) or at 30 bp {i.e, the distance 
sometimes encountered in Synechocystis [19,44,45]). The 
absence of a -35 promoter box in a Synechocystis promo- 
ter is not unprecedented as we previously showed that 
both the lexA and recA genes are well expressed from 
-35-less promoters [39]. Furthermore, as a70-type pro- 
moters lacking a -35 box often possess an "extended -10 
box" (5'-TGnTATAAT-3') mediating all contacts with 
the RNA polymerase a70 factor [47,48], it is worth not- 
ing that the slrl 738 promoter harbors such an "extended 
-10 box" (5'-TGnTAagcT-3'), like the Synechocystis secA 



promoter [19]. In contrast, the slll621 promoter 
appeared to possess the two canonical boxes, i.e, a -10 
element (5'-TAttcT-3'; Figure 3) located 6 nucleotides 
upstream of the transcription start site (the A nucleotide 
noted as -hi, we found to be located 73 bp upstream the 
ATG start codon; see Additional file 6 Figure S4) and a 
-35 (5'-TTGACt-3') box located 17 nucleotides upstream 
of the -10 element. 

We also introduced and tested the slrl738-cat and 
slll621-cat reporter plasmids in the Aslrl738 deletion 
mutant we previously constructed [14]. We found the 
slrl 738 promoter to be more active in the absence of 
the Slrl738 protein, demonstrating that Slrl738 is an 
autorepressor (Figure 3). Similarly, the slll621 promoter 
appeared to be more active in the absence of the 
Slrl738 protein (Figure 3), demonstrating that the 
Slrl738-mediated downregulation of the slll621 gene 



Garcin et al. BMC Structural Biology 2012, 12:1 
http://www.biomedcentral.eom/1 472-6807/1 2/1 



Page 7 of 1 5 



TSS 

-10 p- 

-// TCTmCTCAGAmGTTCTGAGATAGmTAAGTCAAGCTTAJUUUlTT 



+73 .__// AGAAAAGAc jrCITAlj cAAGAClClATCTTIAi rCAGTlj cGAAITITTAATIAIAAAAAAAATATTAATAAGAlAGATIATCAGGAAGATCTTITACAAITCGATCTA^ 

4 1 -10 



TSS S111621 
CAT units 

92 23 
nd nd 



nd 51 



•351 



slrl738 
-► 

-//- +120 
.-//- 



CAT units 



_.// 

_.// 
















//_. 

//__ 


__// 














G 


- - -//- 


__// 














C 


//__ 


__// 


r 














//__ 


__// 


0 












//__ 



20 



/ 



52 



21 nd 



nd nd 



__// _ cc- 

nd 11 -// GG- 



_//_, 
.//_, 



nd nd 



99 93 



__// . 



. — ^AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^- 



29 53 



<1 <i no promoter insert „ -1 -1 

Figure 3 Mutational analysis of the slrl738 and $111621 promoter region transcriptionally fused to the cat reporter gene. The 

transcription start-sites (TSS, + 1) are represented as bent arrows pointing into tine direction of tine transcription of sir! 738 (top DNA strand) and 
5111621 (bottom DNA strand), respectively. Tine lengtli of tine cyanobacterial DNA segment between tine TSS and tine cot reporter gene is 
indicated as +120 (slrl738-cat fusion) and +73 (Sill 621 -cat fusion). Nucleotide substitutions or deletions in the promoter sequences are written in 
bold upper cases or represented with triangles, respectively. The -35 and -10 promoter elements are boxed and shaded in gray, like the AT-rich 
Slrl738-binding region (Figure 5). The CAT activities determined in wild-type or Aslrl738 mutant are the average values calculated from at least 
3 independent experimental repeats (standard deviations were less than 10% of sample averages). The present data indicate that this AT-rich 
Slrl738-binding region operate in the SIrl 738-mediated negative regulation of the sill 621 promoter activity. 



[14-16] is exerted at the level of its promoter activity. 
Furthermore, we also found that the above-mentioned 
long (33 bp) AT-rich motif, which we anticipated to 
interact with Slrl738, truly operates in the Slrl738- 
mediated repression of the slll621 promoter (Figure 3). 

Structure modelisation of (Slr1738-Zn-Fe)2 protein 
complexed to DNA 

At the beginning of our present study of the PerR-like 
regulator Slrl738, no structure of the metal-containing 
form of a PerR regulator was available in data banks. 
Therefore, starting from the Slrl738 primary sequence, 
(Figure lA), we built its tertiary structure using homol- 
ogy modelling methods, energy minimization and short 
MD simulation. Psi-Blast analysis showed that Slrl738 
exhibits 21% sequence identity (37% for homology) with 
the Pseudomonas aeruginosa FUR protein for which a 



crystal structure was available (PDB ID IMZB) [20]. 
Based on this and other findings and on our metal titra- 
tion experiments (data not shown), we completed the 
two Slrl738 metal binding sites with both a zinc ion 
(Zn^"^) and a ferrous ion (Fe^"^). These two metal sites 
were described as essential for the folding and activity 
of genuine PerR and FUR regulators [4,6-8,49]. In our 
Slrl738 model, highly conserved cysteines (C95, C98, 
C134 and C137) operate in the coordination of the cru- 
cial zinc atom (Additional file 7 Figure S5A). The iron- 
binding site displays a hexa-coordinated geometry (H36, 
H90, H92, and D84, D103 and a water molecule) and 
the anchoring of the N-terminal domain of Slrl738 on 
its C-terminal domain results essentially from H36-Fe 
coordination. However, as previously observed [15], we 
found experimentally that iron-less protein samples of 
Slrl738 were still able to bind DNA, unlike what was 
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observed for the PerR protein where iron is crucial to 
DNA binding [6,8]. 

The resulting 3D model of Slrl738 comprises two 
well-defined domains (Figure IB). The N-terminal 
domain from amino acids 1 to 84 contains four a- 
helices (Hi to H4) and a typical helix-turn-helix motif 
involving helices H3 and H4 that is responsible for the 
binding to DNA [50]. The helix H4 is the recognition 
helix that interacts specifically with the major groove of 
DNA. The two H3 and H4 helices are followed by two 
anti-parallel p -strands (Si and S2) that formed a winged 
helix-turn-helix motif (wHTH) [51]. The C-terminal 
domain of Slrl738 from amino acid 85 to 139 comprises 
three P -sheets (S3 to S5) and one a-helix (H5) involved 
in the dimerization of Slrl738. 

The Sir 1738 dimer model we propose (Figure IC) was 
first built from the Pseudomonas aeruginosa FUR struc- 
tural information of Pohl et al, [20]. Our model is con- 
sistent with previous findings showing that the active 
form of FUR-type regulators is a dimer [52,53] and that 
Slrl738 too was proposed to bind to DNA as a dimer 
[15]. We then refined our model with the structure of 
the Bacillus subtilis PerR protein (PDB ID 2FE3 without 
Fe and 3F8N with Mn replacing Fe) [7,8]. The final 
RMSD values for the heavy atoms of the backbone were 
about 15 A and 4 A respectively for the refined struc- 
tures, demonstrating the great role of iron in modifying 
the global folding. The zinc metal ion appears to be 
indispensable for structuring the C-terminal domain to 
enable dimerization [49]. The interactions involved at 
the interface of the dimer structure are depicted in 
Additional file 8 Figure S6. 

Currently there is no 3D structure in the PDB data- 
base of a FUR/PerR-like regulator complexed with 
DNA. Therefore, we developed an approach, detailed in 
the Methods section, based on experimental 3D struc- 
tures deposited in databases for predicting the structure 
of protein-DNA complex. Our model is consistent with 
the above-mentioned findings that Sir 1738 binds to the 
310 bp-long promoter region of the divergent genes 
sill 621 and sir 1738 that contains a long AT-rich motif 
important for Slrl738-mediated regulation (Figure 3), 
which harbors a central symmetry for double strands, /. 
e. an ideal feature for binding a dimeric protein acting 
like pliers. Consequently, we chose a symmetrical 25 bp 
AT-rich DNA fragment for our modelling purposes, in 
agreement with the fact that the well-defined FUR-bind- 
ing DNA sequences usually contain AT-rich palin- 
dromes [54]. We obtained the protein-DNA complex by 
fitting the (Slrl738)2 model onto well positioned mono- 
mers using a strategy based on structural analogy with 
experimental structures, i.e. the recognition helices of 
the HTH motif being inserted into the major groove of 
target DNA. The selection of the final model was based 



on both structural criteria and energy calculations. Our 
best complex model is the one based on the structural 
pattern derived from ISAX. 

Considerations of DNA conformation 

In our approach, we did not take into account the DNA 
structural specificities possibly imposed by the oligonu- 
cleotide sequence. Indeed, we kept the DNA backbone 
structure of the template model to maintain the struc- 
tural conformations occurring at the protein-DNA inter- 
face. Hence, we selected the template structures based 
on the wHTH motif identification of the Slrl738 pro- 
tein. This choice was dictated by our long-standing 
expertise concerning DNA structure [55-59]. DNA 
molecules are very flexible and they can adopt many 
local conformations [60] depending on both their 
nucleotide sequence and the physico-chemical environ- 
ment, as discussed in a wealth of papers. As discussed 
above, in absence of a genuine consensus sequence for 
the binding of Slrl738 onto DNA, we used its AT-rich 
regulatory element emerging from our work (Figure 3). 
Such an AT-rich DNA sequence may of course adopt 
particular local conformations that are not easy to pre- 
dict and integrate in a 3D structure. Nevertheless, we 
note that: 1) the final structure was minimized in allow- 
ing DNA to relax; 2) MD simulations were performed 
to allow the structure, and thereby the DNA molecule, 
to explore 3D space; 3) the four 3D structures retained 
for the final selection (Table 1) are very different in 
terms of the AT content of the target DNA sequences, 
it varies from 48% for 1U8R to 86% for 1Z9C. The tem- 
plate structure we finally retained, i.e. ISAX, is 68% AT- 
rich. Furthermore, all these DNA structures, analyzed 
with the 3DNA program [61], were found to adopt a 
globally B-form without bending in spite of their widely 
different AT contents; 4) it remains difficult nowadays 
to predict local DNA conformations from nucleotide 
sequences, unlike the situation of proteins in which the 
secondary structure is predictable from their primary 
sequence. Thus, we can obviously be more confident in 
selecting a protein template than a DNA template. 

Structural analysis of the complex formed by the Slr1738 
dimer and its DNA target 

Recognition of DNA sequence is mediated by both 
direct interactions between amino acids and the bases in 
the major DNA groove (direct readout) and by contacts 
with DNA backbone (indirect readout). Specific interac- 
tions between Slrl738 and its target DNA concerned 
essentially the residues of the top of the recognition 
helices and ca. ten bases. Even though the dimeric form 
of Slrl738 docks an oligonucleotide of 25 bp in length, 
each Sir 1738 monomer specifically interacts with only 5 
bases in a symmetric way. The interactions are 
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schematically represented (see Additional file 9 Figure 
S7) inside a simple 7-1-7 bp DNA motif, the minimal 
recognition motif for FUR binding [62]. 

The final model of the complex predicts that there are 
six contact regions between the Slrl738 dimer and its 
target DNA, and each monomer operates in three of 
them via its three sub-regions, namely: (i) the loop 
between and H2, (ii) the a -helix H4 and (iii) the loop 
between Si and S2 (Figure 4A), which are discussed 
below. 

(i) The loop between Hi and This region of 
Slrl738 comprises 11 amino acids including 4 posi- 
tively-charged residues (^^KERGLRVTPQR^^). Two of 
them (R18 and R23) are presumably involved in electro- 
static interactions with atoms of the negatively-charged 
phosphate groups of the DNA backbone in the minor 
groove. Sequence alignments of FUR-like proteins show 
that K13, RIB and R23 are highly conserved, especially 
among cyanobacterial Slrl738 orthologs (Additional file 
7 Figure S5). This first region establishes only non-spe- 
cific interactions with DNA in that there is no contact 
of protein side-chains with DNA bases. We propose that 
these long-range electrostatic interactions operate in the 
pre-orientation of the DNA-binding protein domain to 
facilitate protein/DNA interactions and in the stabiliza- 
tion of the resulting Slrl738/DNA complex that 



precedes the sequence selective interaction mediated by 
the H4 protein helix. 

(ii) The H4 recognition helix, H4A and H4B, the respec- 
tive recognition helices of monomers A and B are 
inserted into the major DNA groove where they likely 
act as a pair of pliers. A detailed description of the H4 
helix is really informative in this respect. H4 comprises 
14 amino acids (residues 55-68) organized in three dif- 
ferent sides (Figure 4B). The first side gathers the hydro- 
phobic residues (V58, L62 & L65) lying in direct contact 
with numerous hydrophobic residues of the three other 
helices of the N-terminal domain, thereby forming a 
compact hydrophobic cluster. The second side, com- 
posed of 7 small residues (A56, T57, S60, S61, A64, S67 
& V68), is close to the DNA strand 2. The third side 
comprising bulky and charged residues (Q55, Y59, K63 
& Q66) faces the DNA strand 1. The residues in close 
vicinity of DNA bases are mainly those of the upper 
part of the helix which are localized in sides two and 
three: Q55, A56, T57, Y59, S60 and K63. Sequence 
alignment of FUR proteins shows that the DNA recogni- 
tion helix always contains a conserved amino acid motif 
(A)TVY or (A)TIY where the tyrosine is important for 
DNA binding [36]. For Sir 1738 the H4 helix also has 
four hydroxyl-containing residues (serine and threonine 
residues), a feature possibly important because this 




Figure 4 DNA-(Slr1 738)2 complex structure properties. A) 3D structure of DNA-(Slrl 738)2 complex showing six contact regions between tine 

protein (in grey) and tine DNA molecule (in cyan). For each monomer, the predicted contact regions are the loop between Hi and H2 (in 

orange), the helix H4 (in red) and the loop between Si and S2 (in yellow). B) Helical wheel representation of the recognition helix (H4) of the 

monomer A. Three faces with different properties are presented. In soft pink are shown the hydrophobic residues turn toward the protein. In 

dark pink are shown the small residues facing the DNA strand 2 and in red are represented the large and charged residues in interaction with 

the DNA strand 1. Hydrophilic residues are present as circles, hydrophobic residues as diamonds and potentially positively charged as pentagons 

http://rzlab.ucr.edu/scripts/wheel/wheel.cgi. 
I ) 
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DNA contact region seems to be the only one involved 
in sequence recognition. 

(Hi) The loop between Sj and The third region of 
Slrl738 in contact with DNA contains three negatively- 
charged glutamic acids (^^EVLLEEGVC^^) that may 
interfere with the approach of the Si and S2 anti-parallel 
strands to the negatively- charged DNA backbone. How- 
ever, the presence of divalent cations such as Mg^"^ can 
bridge interactions between these negatively charged 
residues (phosphate group and glutamic amino acid) 
allowing the approach of the Si and S2 anti-parallel 
strands. In our model, E78 is very well positioned to 
form such an electrostatic bridge. Furthermore, we 
know that the iron-binding regulatory site allows proper 
folding of Slrl738. In particular, its coordination by H36 
induces correct contacts between the N-terminal and C- 
terminal domains. Thus, the modification of this site by 
oxidation could break the link between the N- and C- 
terminal domains, thereby preventing the damaged pro- 
tein from binding DNA [6]. Consequently, we propose 
that the negatively-charged amino acids of this DNA- 
binding region participate via strong electrostatic repul- 
sions with the DNA backbone to unlock Slrl738 by 
inducing a rotation of its N-terminal domain. 

During the MD, the three contact zones described 
above show different behaviours. The first one, the loop 
between Hi and H2, remains stable thanks to the pre- 
sence of the positively charged residues R18ab> Q22ab 
and R23ab which establish strong electrostatic interac- 
tions with the phosphate atoms of the target DNA that 
were maintained during the whole simulation. Thus, the 
dimeric form of Slrl738 makes six non-specific electro- 
static interactions that maintain the integrity of the 
DNA-protein complex. The second region comprising 
the recognition helix H4 of each monomer exhibited a 
non-symmetrical behaviour for each monomer, in agree- 
ment with the fact that the H4A and H4B helices interact 
with non-similar target DNA sequences. In our model 
structure, only the H4A helix of the monomer A of 
Slrl738 has its tyrosine residue at a correct distance to 
the pair of thymines of the target DNA, which are 
known to interact strongly with tyrosine [33,36]. The 
alcohol group of the tyrosine is involved in a hydrogen 
bond with the phosphodiester backbone while its aro- 
matic cycle is facing the methyl groups of the adjacent 
thymines. Moreover, the recognition helix and the turn 
preceding it contain five serines and two threonines, the 
repetition of which might operate in sequence specific 
recognition. Indeed, we noticed during MD simulations 
that serine or threonine could contact a thymine residue 
via a specific dual interaction. First, the close vicinity of 
the methyl group of thymine and the CH2 group of ser- 
ine or the CH3 group of threonine allows the formation 
of a long-range hydrophobic interaction. It can also 



form a hydrogen bond between the alcohol group of 
serine or threonine and the ketone group of the thy- 
mine. Concerning the third Slrl738-DNA contact region 
that corresponds to the loop between Si and S2, we 
observed no significant difference in the mobility of the 
Slrl738 protein bound or unbound to DNA. The three 
glutamic acids in this region induce the DNA to move 
slightly away from the antiparallel strands Si and S2. 

Multimer complex hypothesis 

The FUR box consensus sequence classically defined as 
a 19 bp inverted repeat sequence [63] binding a dimeric 
FUR protein is regarded by some authors as a 15 bp 
region with a 7-1-7 motif [62] binding a tetrameric 
(dimer of dimers) FUR. It was also shown that FUR 
does not bind to PerR boxes though they share with 
FUR boxes six identical nucleotides within each hepta- 
mer. A model of two Pseudomonas aeruginosa FUR 
dimers binding a canonical B-DNA was also proposed 
by Pohl et al. [20] to take into account that multiple 
FUR proteins protect a larger DNA region (at least 27- 
30 bp) than a single FUR dimer (around 20 bp). In this 
model, the two FUR dimers are located on opposite 
sides of the DNA molecule. Escolar et al [64] have rein- 
terpreted the 19 bp consensus FUR binding site as an 
array of three repeats of the invariable 6 bp GATAAT 
sequence in Escherichia coli while in vitro gel shift and 
DNase footprinting assays led Lavrrar et al, to propose 
that three FUR dimers (hexamer form) may bind to the 
19 bp FUR box [65,66]. The possible occurrence of dif- 
ferent forms of FUR and PerR regulators binding with 
different affinities onto their target genes may explain 
why some of them are not always co-regulated, depend- 
ing on the environmental conditions. 

Considering these interpretations and our 3D model, 
we calculated the Slrl738-buried surface of DNA with 
the NACCESS program [67] and found that the PerR- 
like regulator Sir 1738 likely protects 21 bp of its target 
DNA, a value close to those mentioned above for FUR/ 
PerR. 

Also interestingly, while Slrl738 represses both 
sir 1738 and slll621 in normal conditions (see above), 
these two genes are not always co-regulated [68] since, 
for instance, sir 1738 but not sill 621 is regulated posi- 
tively by cadmium [14]. Consequently, by analogy with 
FUR and PerR regulators, we decided to explore the 
idea that several dimers of Slrl738 could bind to the 
slll621-slrl738 promoter region with different patterns, 
thereby accounting for the similar or different regulation 
of the sir 1738 and slll621 genes depending on the 
stress. This idea was reinforced by the observation that 
Slrl738-like metalloregulators may occur as multimers 
(dimers, trimers or tetramers) under different redox 
conditions [69]. Therefore, we built different 3D 
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structures of the Slrl738/target DNA complex by chan- 
ging the number of and/or spacing between the Slrl738 
dimers bound to the DNA. In practise, we tested all 
possibilities of DNA-binding of Slrl738 tetramers 
(dimer of dimers) by fixing the first dimer and moving 
the second one along the DNA with a one base pair 
increment so as to steadily increase the spacing between 
the dimers. This strategy enabled us to (i) structurally 
validate the different models; (ii) characterize the con- 
tact surfaces between each dimer; and (iii) examine 
more complex multimeric models. As expected, the 
interaction surface between the DNA and the proteins 
calculated for our models varies significantly when 
changing the spacing length between dimers (Additional 
file 10 Table S3). The DNA overlapped surface globally 
decreases when the spacing between dimers increases, 
until it reaches a stabilized value around 3300 A^. 
Obviously, the hypothetical tetrameric models corre- 
sponding to a dimer spacing of 1, 2, 8, 9, 10 or 11 bp 
are unrealistic because the resulting large surface over- 
lapping between Slrl738 dimers would generate severe 
steric clashes. Others model combinations without steric 
clashes can be divided in two groups. The first group 
comprises the tetramers with no contact surface 
between its dimers, as occurs in the 5 bp spacer model 
for the FUR regulator [20], and in models with spacing 
of 4, 14, 15, 16, 22 bp or more. The second group 
includes models with a spacing of 3, 6, 7, 12, 13, 18, 19, 
20 or 21 bp for which we observe a contact surface 
between the dimers. With our model of (Slrl738)2, 21 
bp is the maximum offset allowing inter-dimer contacts. 
Among all these favourable combinations, offsets of 3 
and 12 bp show the highest contact surface between 
dimers, possibly involving a more stable tetrameric com- 
plex. However, the 6 bp-spacing model may have biolo- 
gical relevance, though the contact surface between 
dimers is smaller than in other constructions. This par- 
ticular model, where the protein tetramer contacts both 
faces of the DNA helix, is fully compatible with the 7-1- 
7 inverted repeats of the Escherichia coli FUR target 
DNA, as well as the DNase I footprinting and gel shift 
evidence of the existence of an overlapping between 
Escherichia coli FUR dimers bound to DNA [65,66]. 

We also constructed more complex structural systems 
describing higher order multimeric regulators i.e. hex- 
amers and more, and found two structures that may 
account for some biological observations. The first mul- 
timer structure is a hexamer composed of three dimers 
with a spacing of either 3 bp (0-3-6 model) or 6 bp (0- 
6-12 model). These two models allow the formation of a 
hexamer-DNA complex with no steric clash, which may 
operate in the co-regulation of the two genes sir 1738 
and sill 621 as shown for the 3 bp-spacing model in Fig- 
ure 5. Detailed structural interactions between each 



dimer in these hexamer models are shown in Additional 
file 11 Figure S8. Schematic representation of the 
sill 621 -sir 1738 promoter region in these models predict 
that the binding of Slrl738 will preclude the binding of 
sigma factors, thereby impairing the transcription of the 
sir 1738 and sill 621 genes. Furthermore, in this switch 
mechanism the two Slrl738 dimers bordering the hex- 
americ regulator might be alternatively released in some 
environmental conditions thereby allowing the specific 
transcription of either slll621 or sir 1738. The second 
particular multimeric structure, possibly occurring in 
response to a high concentration of the FUR regulator 
[65,70], involves polymerization of multiple FUR pro- 
teins along their target DNA molecule which is there- 
after no longer accessible to other enzymes. Such 
polymerization is only possible when a 6 bp-spacing (0- 
6-12 model) occurs between the FUR dimers (Figure 6). 
It could explain the structural organization of FUR- 
binding sites with repeated DNA sequence (GATAAT- 
GATAAT)n. By contrast, in the 0-3-6 hexameric model, 
this polymerization is not possible because a spacing of 
about 21 bp is necessary to allow the binding of the 
next hexamer onto the DNA helix. 

Conclusions 

In the current post-genomic era the thousands of 3D 
protein structures available at the PDB database can be 
used as template to predict the possible fold of structu- 
rally uncharacterized proteins of biological interest. This 
task is important for mind-oriented analysis of the inter- 
actions between these proteins and their DNA and/or 
protein partners. In this frame, we used a combination 
of different types of data (structural, biochemical and 
genetic) to build and validate a relevant model of the 
tridimensional structure of a biologically important pro- 
tein-DNA complex. This complex plays a central role in 
the regulation of cyanobacteria (environmentally crucial 
organisms) by interdependent oxidative and metal stres- 
ses. It is formed between the PerR/FUR-like Slrl738 reg- 
ulator and its main DNA target, i.e. the promoter region 
of the divergent genes sir 1738 and the peroxiredoxin 
(anti-oxidant) encoding gene sill 621. The detailed analy- 
sis of the Slrl738/DNA complex, and the related com- 
plex between genuine PerR/FUR-type regulators and 
their target DNA allowed us to better understand the 
interactions involved in the protein-DNA recognition 
and the stability/ integrity of the structure. 

The method we propose in this work to build the 
(Slrl738)2/DNA complex, that is based on fold-level 
similarity between DNA binding domains to transfer 
DNA orientation from a co-complex structure to a pro- 
tein-only model, needs to be formalized and automated 
to DNA to be applied by other researchers to build 
other protein/DNA complexes of interest. We also 
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Figure 5 3D structure of the hexameric model. A) 3D hexamer structure binds to a 44 bp DNA and builds witli a spacing of 3 bp between 
eacli dinners. B) Scliematic representation of tine DNA intergenic region between 5111621 and slrl738 genes protected by Slr1738 liexamer 
binding. Tine transcription start site, containing tine promoter sequence and both -10 and -35 boxes, are reported for eacli gene. Dotted lines 
mark the boundaries of DNA fragment in the model. Black bold frame indicates the 27 bp region protected by the Slrl738 hexamer in the case 
of a 0-3-6 model, grey bold frame indicates the 33 bp region protected in the case 0-6-12. Green lines represent the approximate DNA 
recovering region by sigma factor protein that is necessary for RNA polymerase recruitment. 
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develop a simple theoretical strategy to predict more 
complex structures involving the binding of multiple 
dimers of regulators to the same molecule of DNA. In 
this strategy, we consider each dimer as a molecular 
building block that can be moved along the target DNA 
molecule to hug it, and combine the resulting potential 
structure with footprinting and gel shift data to propose 
attractive model structures. One of these, a switch 
model, may explain a biological mechanism Le, the pre- 
sence/absence of coordinated expression of the genes 
co-regulated by the same Slrl738/FUR/PerR-type regu- 
lator, depending on the environmental conditions. We 
believe that our strategy for studying protein/DNA 
interactions will help to decipher the molecular basis of 
a wealth of regulatory mechanisms that are crucial for 
life. 
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